{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 15 - Hypothesis testing of Proportions continued\n", "\n", "We will use the green taxi trip data for this lab.\n", "\n", "First, let's import the necessary libraries." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load your green taxi trip data into the dataframe `taxi`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering taxi trips\n", "\n", "Suppose we are only interested in taxi trips that either leave or drop off near Lehman College. To get those trips, filter the data to be only those trips with a Pick-up location (PULocationID) or Drop-off location (DOLocationID) of 18 (Bedford Park) or 136 (Kingsbridge).\n", "\n", "Store these trips in a dataframe called `taxi_lehman`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many such trips are there?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing proportion of long trips \n", "\n", "In the winter the proportion of trips to or from Lehman that are longer than 3 miles is 0.5.\n", "\n", "We will test the following hypothesis:\n", "\n", "Null hypothesis: The proportion of trips to or from Lehman on Sept. 3 is also 0.5.\n", "Alternative hypothesis: The proportion of trips to or from Lehman on Sept. 3 is not 0.5.\n", "\n", "First, let's calculate the proportion of trips in `taxi_lehman` that are longer than 3 miles." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we simulate data according to the null hypothesis (as in Lab 14). \n", "\n", "First simulate 53 (or the number of trips in `taxi_lehman`) trips, with each trip having a probability of 0.5 of being short or long." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Count the number of long trips in your simulated data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Repeat the code of simulating a data set and counting the number of long trips 10,000 times." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot your simulated counts of long trips as a histogram." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, compare the actual proportion of long trips to the histogram. Does the data appear to have come from this distribution? If so, we fail to reject the null hypothesis. If not, we can reject the null hypothesis and accept the alternative hypothesis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparing proportions\n", "\n", "Let's return to our hypothesis from Lab 12: Taxis with more than 1 passenger take longer trips on average than taxis with more than 1 passenger.\n", "\n", "Suppose we know that the proportion of trips taken by 1 passenger longer than 3 miles is 0.312. We want to know if the proportion of trips taken by 2 or more passengers is the same or not.\n", "\n", "Null hypothesis: The proportion of trips longer than 3 miles with 1 passenger is the same or greater than the proportion of trips longer than 3 miles with 2 or more passengers.\n", "\n", "Alternative hypothesis: The proportion of trips longer than 3 miles with 1 passenger is less than the proportion of trips longer than 3 miles with 2 or more passengers.\n", "\n", "### Testing the hypothesis\n", "\n", "First, from our data, let's compute the proportion of trips taken by 2 or more passengers longer than 3 miles." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we simulate the data according to the null hypothesis.\n", "\n", "First count how many trips have two or more passengers. This will be our sample size. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next simulate that many trips, with each trip having a probability of 0.312 of being longer than 3 miles." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Count the number of long trips in your simulated data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Repeat the code of simulating a data set and counting the number of long trips 1,000 times (10,000 times may take too long since our sample size is bigger than usual). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot your simulated counts of long trips as a histogram." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, compare the proportion of long trips with 1 passenger to the histogram. Does the data appear to have come from this distribution? If so, we fail to reject the null hypothesis. If not, we can reject the null hypothesis and accept the alternative hypothesis." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }